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The issue of the severity of psychiatric disorders has great clinical importance. For example, severity influences decisions about level of care, 
and affects decisions to seek government assistance due to psychiatric disability. Controversy exists as to the efficacy of antidepressants across 
the spectrum of depression severity, and whether patients with severe depression should be preferentially treated with medication rather than 
psychotherapy. Measures of severity are used to evaluate outcome in treatment studies and may be used as meaningfid endpoints in clinical 
practice. But, what does it mean to say that someone has a severe illness? Does severity refer to the number of symptoms a patient is experienc¬ 
ing? To the intensity of the symptoms? To symptom frequency or persistence? To the impact of symptoms on functioning or on quality of life? 
To the likelihood of the illness resulting in permanent disability or death? Putting aside the issue of how severity should be operationalized, 
another consideration is whether severity should be conceptualized similarly for all illnesses or be disorder specific. In this paper, we examine 
how severity is characterized in research and contemporary psychiatric diagnostic systems, with a special focus on depression and personality 
disorders. Our review shows that the DSM-5 has defined the severity of various disorders in different ways, and that researchers have adopted 
a myriad of ways of defining severity for both depression and personality disorders, although the severity of the former was predominantly 
defined according to scores on symptom rating scales, whereas the severity of the latter was often linked with impairments in functioning. 
Because the functional impact of symptom-defined disorders depends on factors extrinsic to those disorders, such as self-efficacy, resilience, cop¬ 
ing ability, social support, cultural and social expectations, as well as the responsibilities related to one's primary role function and the avail¬ 
ability of others to assume those responsibilities, we argue that the severity of such disorders should be defined independently from functional 
impairment. 
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The determination of illness severity has important clinical 
implications. Depending on the disorder, severity affects deci¬ 
sions to seek treatment, the type and intensity of treatment, 
and whether to continue or stop treatment. Severity also im¬ 
pacts expectations in the fulfillment of role function and dis¬ 
ability status. Measures of severity are used to evaluate out¬ 
come in treatment studies and may be used as meaningful 
endpoints in clinical practice. 

But, what does it mean to say that someone has a severe ill¬ 
ness? Of the various dictionary definitions of "severe’', the one 
that is most relevant to the characterization of illness is "of 
great degree". This definition, however, does not convey what 
is meant when an illness is considered "severe”. Does severity 
refer to the number of symptoms a patient is experiencing? To 
the intensity of the symptoms? To symptom frequency or per¬ 
sistence? To the impact of symptoms on functioning or quality 
of life? To the likelihood of the illness resulting in permanent 
disability or death? 

Some of these questions about the meaning of severity can 
be further elaborated. For example, with regards to the pre¬ 
diction of mortality, does severity allude to imminent death, 
death in the near future, or death at any time in the future? 
Also, should the impact of intervention be considered? That 
is, is an illness severe only when death is likely if the illness is 
left untreated, or only if death is likely regardless of inter¬ 
vention? 

Perhaps severity determinations should be independent of 
functional impact or prognosis and instead should be based 


on structural or morphological changes and damage to the 
diseased organ. To be sure, this is not relevant for many ill¬ 
nesses, but, when it can be measured, should this be the guid¬ 
ing principle for rating illness severity? 

Putting aside the issue of how severity should be operation¬ 
alized, another consideration is whether severity should be 
conceptualized similarly for all illnesses or be disorder specific. 
Should the severity of heart failure, rheumatoid arthritis, dia¬ 
betes, an acute upper respiratory tract infection, and a head¬ 
ache be judged according to a common standard or metric, or 
should each disorder have its own respective guidelines for 
rating severity? 

In this paper, we examine how severity is characterized in 
psychiatric research and contemporary psychiatric diagnostic 
systems. To illustrate some of the issues and controversies in 
determining the severity of psychiatric disorders, we focus on 
depression and personality disorders (PDs). The clinical sig¬ 
nificance of considering the severity of depression is reflected 
in official treatment guidelines wherein recommendations are 
based on illness severity 1,2 . The importance of considering the 
severity of PDs is reflected by the ICD-11 proposal to replace 
the specified criteria for different disorders by a single person¬ 
ality disorder category that is graded according to levels of se¬ 
verity 3,4 . 

Before discussing the issue of severity of psychiatric dis¬ 
orders, we present a brief overview of how severity has been 
conceptualized, assessed and measured for various physical 
illnesses, highlighting the variability of approaches. 
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SEVERITY OF PHYSICAL ILLNESSES 

There is no consensus or uniform overriding principle in 
distinguishing between levels of severity of physical illnesses. 
In some cases, severity is defined by the degree of structural 
damage to the diseased organ. For example, the severity of 
rheumatoid arthritis has been defined according to radio- 
graphic evidence of joint damage 5 . The severity of diabetic ret¬ 
inopathy has been graded according to the degree of retinal 
damage assessed in a direct clinical eye exam 6 . In a related 
manner, physiological measures representing the impact of 
disease on the organ have been used to characterize the sever¬ 
ity of some diseases. For example, left ventricular ejection frac¬ 
tion has been used as an index of the severity of cardiovascular 
disease 7 " 10 . Forced expiratory volume has been used as index 
of severity of cystic fibrosis 11 . Aminotransferase and bilirubin 
levels have been used to assess the severity of hepatitis 12 . 

Sometimes severity is defined by a disorder-specific clinical 
examination. For example, not only have radiographic assess¬ 
ments been used to evaluate the severity of rheumatoid ar¬ 
thritis, but severity has additionally been defined according to 
a count of the number of swollen and painful joints 13 . 

Illness severity has also been defined more broadly to en¬ 
compass indices of the diseased organ as well as related and 
downstream effects. In a study of the prognostic implications 
of post-cardiac arrest illness severity, severity scores were 
based on cardiopulmonary dysfunction and neurologic sta¬ 
tus 14,15 . The severity of sickle cell disease has been based on 
the presence and frequency of complications such as renal 
failure, necrosis of hips and shoulders, and gallstones 16 . In 
studies of the severity of chronic obstructive pulmonary dis¬ 
ease, the BODE index (B, body mass index; O, obstruction of 
airways as measured by forced expiratory volume in one sec¬ 
ond; D, dyspnea scale; E, exercise capacity as measured by a 
six-minute walk test) includes and goes beyond a direct, spe¬ 
cific, assessment of pulmonary damage and has been found to 
be a better predictor of mortality, hospitalization, quality of 
life, and depression than forced expiratory volume alone 17 . 
The Unified Parkinson's Disease Rating Scale contains four 
subscales assessing mental state, activities of daily living, 
motor examination, and complications 1819 . 

Moving further away from a direct or physiological assess¬ 
ment of the diseased organ, the New York Heart Association 
Functional Classification is a measure of cardiac disease sever¬ 
ity based on limitations in physical activities and the presence 
of physical symptoms associated with varying degrees of ac¬ 
tivity 20 . 

In contrast to disorder-specific physical and physiological 
indicators of severity, there are composite measures of overall 
illness severity, such as the Acute Physiology and Chronic 
Health Evaluation (APACHE) scores and the Simplified Acute 
Physiology Score (SAPS), based on non-specific clinical and 
biological indicators of health status such as body temperature, 
age, history of organ failure, electrolytes, and hematocrit 21,22 . 
These illness severity measures have been used to predict mor¬ 


tality in heterogeneous and single disorder samples of acutely 
ill emergency department and hospitalized patients 23,24 . 

Finally, self-report questionnaires have been developed to 
assess the severity of some physical illnesses. The severity of 
benign prostatic hypertrophy as assessed by the American 
Urological Association Symptom Index is based on the fre¬ 
quency of symptoms 25 . The Tinnitus Severity Index is based 
on the frequency of functional impairment or psychological 
symptoms due to tinnitus 26 . The Bowel Symptom Severity 
Scale assesses the frequency, distress and disability of symp¬ 
toms associated with irritable bowel syndrome 27 . The severity 
of headaches as measured by the Headache Impact Question¬ 
naire is a composite measure of headache frequency, the aver¬ 
age pain intensity of headaches, and the impairment resulting 
from headaches 28 . The Liverpool Seizure Severity Scale as¬ 
sesses perceptions of seizure control and severity of ictal and 
postictal symptoms 29 . 

Clark et al 30 summarized the approach taken to develop 
self-report measures of illness severity for six disease states 
studied in the Veterans Health Study. They defined illness se¬ 
verity in terms of patients' perceptions of the magnitude of 
symptoms or complications of the illness that are associated 
with reductions in health-related quality of life or health sta¬ 
tus. They distinguished disease severity from the impact of dis¬ 
ease (e.g., impairment, life satisfaction, well-being), because 
the impact of disease is often mediated by personal character¬ 
istics (e.g., resiliency, self-efficacy) and social context. 

SEVERITY OF PSYCHIATRIC DISORDERS AS 
DESCRIBED IN DSM-5 

In contrast to some physical illnesses, there are no specific 
or non-specific biomarkers of psychiatric disorders that validly 
characterize the severity of the disorder. In the absence of such 
biological or structural indicators, researchers and clinicians 
are left to assess the epiphenomena of a psychiatric disorder 
to judge its severity. 

Discussions of resource allocation in the public health sec¬ 
tor often focus on patients with severe mental illness, though 
there is no consensus in how to define such an illness 31,32 . The 
DSM-5 33 , like its immediate predecessors, defines severity for 
only some disorders. Table 1 lists the DSM-5 disorders with 
defined levels of severity. 

The DSM-5 approach towards defining severity varies a- 
cross disorders. The four severity levels of intellectual disabil¬ 
ity (mild, moderate, severe, profound) are the most elaborate¬ 
ly defined, with three pages of descriptions of the adaptive 
functioning deficits characteristic of each level of severity. 
DSM-5 notes that severity was defined according to adaptive 
functioning level rather than IQ scores because the former is a 
better determinant of the level of supports that are needed. 
Similarly, the level of deficits and functional impairment de¬ 
fining the severity of autism spectrum disorders is linked to 
the supports required. The severity of learning disorders refers 
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Table 1 Characterization of disorder severity in DSM-5 


DSM-5 disorder 


Features used to define severity 


Major depressive disorder 

Mania, hypomania 
Alcohol use disorder 
Drug use disorder 
Bulimia nervosa 
Anorexia nervosa 
Binge eating disorder 
Learning disorders 

Attention-deficit/hyperactivity disorder 
Intellectual disability 
Autism spectrum disorder 

Stereotypic movement disorder 
Psychotic disorders 


Reactive attachment disorder 

Disinhibited social engagement disorder 

Somatic symptom disorder 

Psychological factors affecting 
other medical conditions 

Hypersomnolence disorder 

Narcolepsy 

Obstructive sleep apnea/hypopnea 
Nightmare disorder 
Sexual disorders 
Premature ejaculation 

Substance/medication-induced 
sexual dysfunction 

Oppositional defiant disorder 
Conduct disorder 
Neurocognitive disorders 


Number of symptoms, level of distress caused by intensity of symptoms, and impairment in social and occupational 
functioning 

Same as major depressive disorder 
Number of criteria 
Number of criteria 

Frequency of compensatory behaviors per week 
Body mass index 
Frequency of eating binges 

Severity of deficit in learning skills and likelihood of learning the skills with or without intervention 
Number of symptoms, severity of individual symptoms, or level of impairment caused by the symptoms 
Level of adaptive functioning 

Degree of impairment in functioning due to deficits in verbal and nonverbal communication, inflexibility of behavior, 
difficulty coping with change, or restricted/repetitive behaviors 

The ease by which the symptoms can be suppressed and the need for intervention to prevent serious injury 

Quantitative assessment on 5-point scale of primary feature of the psychosis (delusions, hallucinations, disorganized 
speech, abnormal psychomotor behavior, and negative symptoms). Rating is based on symptom intensity or subjective 
distress due to symptom 

Only the severe type is defined. Severe is defined as all criteria met at a high level 
Only the severe type is defined. Severe is defined as all criteria met at a high level 
Number of criteria and somatic complaints 
Degree of impact on medical condition or medical risk 

Number of days per week with difficulty maintaining daytime alertness 

Frequency of cataplexy and responsiveness of cataplexy to medication, number of naps per day, degree of disturbance of 
nocturnal sleep 

Apnea/hypopnea index score 
Frequency of nightmares per week 
Degree of distress related to symptoms 
Time to ejaculation 

Percentage of occasions of sexual activity that dysfunction occurs 

Number of settings in which the symptoms occur 

Number of conduct problems or the degree of harm caused to others 

Degree of difficulty with instrumental activities of daily living 


to the difficulties in learning skills as well as the likelihood of 
learning those skills with or without intervention. For ex¬ 
ample, DSM-5 defines severe impairment of a learning dis¬ 
order as "severe difficulties learning skills, affecting several 
academic domains, so that the individual is unlikely to learn 
those skills without ongoing intensive individualized and spe¬ 
cialized teaching for most of the school years". For these dis¬ 
orders, then, the severity specifier is explicitly linked to sug¬ 
gested levels of intervention. 

Depression and mania are classified as mild, moderate or 
severe according to the number of symptoms, the level of dis¬ 
tress caused by the intensity of the symptoms, and the degree 
of impairment in social and occupational functioning. The se¬ 


verity of alcohol and drug use disorders is based on the num¬ 
ber of criteria that are met (mild: 2 or 3 criteria; moderate: 4 or 
5 criteria; severe: 6 or more criteria). The severity of attention- 
deficit/hyperactivity disorder is based on the number of symp¬ 
toms, severity of individual symptoms, or level of impairment 
caused by the symptoms. The severity of bulimia nervosa is 
operationalized according to the number of inappropriate 
compensatory behaviors per week (mild: 1-3; moderate: 4-7; 
severe: 8-13; extreme: 14 or more), though the severity desig¬ 
nation could be increased to reflect other symptoms or level of 
functional impairment. For anorexia nervosa, severity is de¬ 
fined according to body mass index, and for binge eating dis¬ 
order it is defined by the number of binge eating episodes per 
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week, though, similar to bulimia nervosa, the severity designa¬ 
tion can be increased to reflect other symptoms or degree of 
functional impairment. Severity of sexual disorders is based 
on the level of distress regarding the symptoms, except for 
premature ejaculation, for which severity is based on the time 
to ejaculation. The severity of cataplexy is based, in part, on 
lack of responsiveness to medication. 

This brief overview illustrates the variability in the ap¬ 
proaches taken in the DSM-5 towards defining degrees of se¬ 
verity, with some definitions emphasizing the number of cri¬ 
teria met, some others emphasizing the core feature of the 
disorder, some based on level of distress, and some focusing 
on response to intervention and prediction of course. In con¬ 
trast to many physical illnesses, none of the definitions of se¬ 
verity refer to the likelihood of imminent or distal mortality, 
and most definitions do not refer to prognosis or future course. 
Rather, most definitions of severity in DSM-5 refer to the num¬ 
ber of symptoms or criteria of the disorder, the frequency of 
symptoms, and the level of impairment or distress. 

SEVERITY OF DEPRESSION 

We focus on the severity of depression because it has re¬ 
ceived the most extensive research. While the research has not 
been entirely consistent, the severity of depression has been 
associated with health-related quality of life 34 , functional im¬ 
pairment 35 ’ 36 , suicidality 37 " 39 , longitudinal course 40 " 43 , and sev¬ 
eral biological variables 44 ' 46 . Moreover, the severity of depres¬ 
sion has been at the core of controversies regarding the efficacy 
of treatment and whether certain forms of treatment should be 
recommended as first line interventions. Almost all research 
on severity is based on scores on depression symptom scales, 
though most scales have been developed without consider¬ 
ation as to how to best conceptualize and assess the severity of 
depression. 

Severity levels of depression in DSM-5 and ICD-10 

Three elements are used to define the severity levels of de¬ 
pression in DSM-5: the number of symptoms, the level of dis¬ 
tress caused by the intensity of the symptoms, and the degree 
of impairment in social and occupational functioning. The se¬ 
verity categorization applies to all depressive disorders, not 
just major depressive disorder (MDD). Mild depression is spe¬ 
cified when “few, if any, symptoms in excess of those required 
to make the diagnosis are present, the intensity of the symp¬ 
toms is distressing but manageable, and the symptoms result 
in minor impairment in social or occupational functioning”. 
Severe depression is specified when "the number of symptoms 
is substantially in excess of that required to make the diagno¬ 
sis, the intensity of the symptoms is seriously distressing and 
unmanageable, and the symptoms markedly interfere with so¬ 
cial and occupational functioning”. The DSM-5 does not expli¬ 


citly define moderate depression other than to say that the 
number of symptoms, intensity of symptoms, and/or function¬ 
al impairment are between mild and severe. 

There are some problems with the DSM-5 specification of se¬ 
verity levels. The same definition of the severity specifier is used 
for MDD and persistent depressive disorder. This is a problem, 
because persistent depressive disorder requires fewer symp¬ 
toms than does MDD to meet the DSM-5 diagnostic threshold. 
Thus, a patient with persistent depressive disorder who experi¬ 
ences the same number of symptoms as a patient with MDD, 
and with similar levels of functional impairment and distress, 
may be classified as more severe because the symptom count 
may be "substantially in excess” of the diagnostic threshold for 
persistent depressive disorder but not for MDD. 

Another problem with the DSM-5 severity specifier is that 
the definition of functional impairment is limited to social or 
occupational functioning. This is inconsistent with the word¬ 
ing of the impairment criterion for the diagnosis of MDD and 
persistent depressive disorder, which refers to impairment in 
social, occupational, or other important areas of functioning. 
Thus, individuals who maintain social contacts, are not ex¬ 
pected to be employed, but are unable to function as students 
or full-time parents, could be misclassified as less severe than 
they actually are. 

While moderate severity is not specifically defined, the in¬ 
ternal logic of the wording of the moderate severity description 
has a minor flaw. Mild depression requires low levels of symp¬ 
toms, distress and functional impairment. Conversely, severe 
depression requires high levels of all three. Thus, moderate 
depression should be defined as lying between the mild and 
severe levels in symptoms, distress or functional impairment 
(not and/or as DSM-5 defines it). 

Finally, two other variables often considered important in 
discussions about depression severity - suicidality and need 
for hospitalization - are not considered in DSM-5’s definition 
of severity. 

What evidence supports the validity of the DSM-5 approach 
towards defining severity in this manner? One study from a 
population-based registry of twins who experienced a major 
depressive episode in the year prior to the interview found that 
the three aspects of the severity specifier - number of symp¬ 
toms, severity of symptoms, and degree of functional impair¬ 
ment - were significantly, albeit only modestly, correlated 47 . 
The authors concluded that the DSM severity construct was 
multifaceted and heterogeneous. 

A study of psychiatric outpatients with a mood disorder 48 , 
84% of whom were in a major depressive episode, found that 
the number of DSM-IV symptoms of MDD was weakly corre¬ 
lated with clinicians' ratings on the Clinical Global Impression 
(CGI) 49 and the Global Assessment of Functioning (GAF) 50 . 
Moreover, the severity ratings of some individual symptoms 
of depression were as highly correlated with CGI and GAF 
scores as was the total number of depressive symptoms. A 
small study of psychiatric inpatients with MDD found that the 
number of MDD criteria was weakly correlated with the Glob- 
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al Assessment Scale 51 . Kessler et al 52 analyzed data from the 
National Comorbidity Study (NCS) and found that, compared 
to individuals who reported five or six MDD criteria during 
their worst episode of depression, individuals who reported 
seven to nine MDD criteria experienced more psychosocial 
impairment, more episodes of depression, and greater chro- 
nicity. Wakefield and Schmitz 53,54 examined the NCS database 
as well as another epidemiological survey and suggested that 
the number of depressive symptoms was less important than 
the type of depressive symptoms and other features of compli¬ 
cated depression in predicting future occurrence of a major 
depressive episode, seeking professional help for depression, 
a history of suicide attempt, and a history of psychiatric hos¬ 
pitalization. Thus, symptom count does not seem to be an ad¬ 
equate indicator of depression severity. 

The ICD-10 55 designates three levels of severity - mild, 
moderate and severe - based on number of symptoms, sever¬ 
ity of symptoms, functional impairment, level of distress and, 
indirectly, type of symptoms. In contrast to DSM-5, there is no 
symmetry in the descriptions of the three levels of severity. 
Mild depression refers to the presence of two or three symp¬ 
toms that are distressing though the patient is likely to be able 
to continue with most activities. Moderate depression requires 
four or more symptoms with the patient having great difficulty 
to continue with ordinary activities. Severe depression re¬ 
quires "several symptoms that are marked and distressing, 
typically loss of self-esteem and ideas of worthlessness or guilt. 
Suicidal thoughts and acts are common and a number of 
'somatic' symptoms are usually present". 

As with the definition of the DSM-5 severity specifier, little 
research has been done on the ICD-10 severity specifier, per¬ 
haps because the reliability of making the severity distinctions 
is poor 56 . Poor reliability is not surprising, due to the impre¬ 
ciseness of the severity level definitions 57 . 

The severity definitions in the official diagnostic systems 
have not been used in treatment studies. Rather, in almost all 
those studies, severity is designated by a score on a symptom 
rating instrument - usually the Hamilton Depression Rating 
Scale (HAMD) 58 or the Montgomery-Asberg Depression Rat¬ 
ing Scale (MADRS) 59 . Thus, treatment studies generally do not 
consider other factors that have been used to characterize se¬ 
verity, such as level of functional impairment, degree of suicid- 
ality, or depressive subtype (i.e., presence of melancholic fea¬ 
tures or psychotic symptoms) 60,61 . 

Scales measuring the severity of depression 

The severity of depression has been most frequendy quanti¬ 
fied on paper-and-pencil and clinician-administered rating 
scales. There is variability amongst the instruments in the time 
frame covered (the two most common time frames being the 
past one or two weeks), rating guidelines (most scales use 
Likert-type ratings based on symptom frequency, persistence 
or intensity), and item content. 


Little research has examined which parameters provide the 
most valid indicator of depression severity. Is the severity of 
depression best conceptualized as the number of symptoms 
(i.e., present or absent), frequency of symptoms (e.g., every 
day vs. half the days vs. few days), persistence of symptoms 
(e.g., always present vs. often present vs. sometimes present), 
or intensity of symptoms (e.g., severe vs. moderate vs. mild)? 
Williams et al 62 , in standardizing the scoring of the HAMD, 
created a grid scoring format to incorporate information re¬ 
garding symptom frequency/persistence and intensity in the 
ratings. The only study to examine whether it is important to 
consider both intensity and frequency constructs found that 
symptom intensity was a better indicator of severity than 
symptom frequency 63 . In developing the Patient-Reported 
Outcomes Measurement Information System (PROMIS) de¬ 
pression scale, Pilkonis et al 64 reviewed studies comparing 
alternative response options and concluded that frequency 
scaling outperformed intensity ratings, though these were not 
studies of depression ratings. Thus, the most valid rating for¬ 
mat of depression severity scales is unsettled, and has been 
little studied. 

Should the content of a severity scale be based on the diag¬ 
nostic criteria for the disorder, include other symptoms of de¬ 
pression that are not components of the diagnostic criteria 
(e.g., low motivation), or include symptoms that are frequent 
in depressed patients but are defining features of other dis¬ 
orders (e.g., anxiety, irritability)? And by what standard should 
one judge whether one approach or scale is a more valid indi¬ 
cator of severity? Statistical approaches such as item response 
theory have been used to construct scales 65,66 . While instru¬ 
ments derived from this approach may be psychometrically 
superior to measures based on the diagnostic criteria for MDD, 
such measures do not include symptoms that have long been 
considered to be core components of depression, such as ap¬ 
petite and sleep disturbances or suicidality. If a measure of se¬ 
verity is to be utilized for clinical purposes, and not just for ad¬ 
ministrative outcome measurement, it is important to include 
vegetative symptoms, as the presence of these symptoms af¬ 
fects medication selection 67 , and to assess suicidality because 
of safety concerns. 

While there are differences amongst the scales in how they 
were constructed, their intended purpose, item coverage, and 
rating guidelines, the one commonality is that the overall se¬ 
verity of depression is represented by the sum of the ratings of 
the individual items. For all but a few scales, all items on the 
scale are rated similarly and contribute equally to the total 
score. A notable exception is the HAMD 58 , which includes 
some items rated 0 to 2, and some others rated 0 to 4. To be 
sure, measures differ in their emphasis on different content 
domains of depression 68 . Some measures have been criticized 
as being multidimensional, because a unidimensional con¬ 
struct of depression severity is better able to demonstrate 
treatment effects 69 . However, all scales, even multidimension¬ 
al measures which yield subscale scores, as well as instru¬ 
ments that were initially intended to screen for depression ra- 
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ther than being used as indicators of severity, derive a total 
score that has been used to denote the severity of depression. 

The score summation approach is based on some assump¬ 
tions that have not been empirically supported. Adding up 
item scores to yield a total score as an indicator of overall de¬ 
pression severity assumes that all symptoms are equal indica¬ 
tors of the severity of depression. However, the different symp¬ 
toms of depression are not similarly correlated with clinicians’ 
global ratings of severity 48 . From the psychometric perspec¬ 
tive, the rating options of individual items should convey valid 
information across the entire spectrum of severity 70 . Thus, se¬ 
verely depressed patients should more frequently receive the 
highest rating of a symptom than a low or zero rating, whereas 
mildly depressed patients should more frequently receive rat¬ 
ings indicating mild severity than the highest rating of a symp¬ 
tom. Santor and Coyne 70 , using item response theory data 
analytic techniques, demonstrated that some of the items of 
the HAMD do not meet these assumptions. 

In fact, scales based on item frequency ratings are unlikely 
to meet these assumptions and therefore may not be good 
measures of severity. For example, the items on the 9-item Pa¬ 
tient Health Questionnaire (PHQ-9) are rated on a four-point 
scale of symptom frequency during the past two weeks: (0=not 
at all, l=several days; 2=more than half the days; 3=nearly every 
day) 71 . Patients with MDD would be expected to score a 3 for 
most of the symptoms that are present, because the definition 
of MDD requires symptom presence for at least two weeks. Be¬ 
cause of the ceiling effect, a patient with MDD seen in primary 
care who continues to work would score similarly to a de¬ 
pressed patient who is hospitalized because of difficulties with 
self-care. While there are several studies of the PHQ-9 using an 
item response theory approach, these have been of heteroge¬ 
neous non-depressed psychiatric, medical or community sam¬ 
ples 72 " 78 . We are unaware of any studies evaluating the per¬ 
formance of the PHQ-9 items in a sample of depressed patients 
presenting for treatment. We would predict that, in such a sam¬ 
ple, some - perhaps many - items of the PHQ-9 would be high¬ 
ly skewed because of the aforementioned ceiling effect. No 
studies have examined the impact of different rating guidelines 
on the operating characteristics of items on a depression scale. 

Implicit in the score summation approach is that low level 
ratings across many symptoms reflect equal severity to high 
ratings across a fewer number of symptoms. For example, 
someone who indicates that, in the past week, he/she has infre¬ 
quently experienced low mood, insomnia, low self-esteem, 
guilt, reduced concentration, fatigue, psychomotor slowing, in¬ 
somnia, reduced appetite, reduced concentration, impaired 
decision making, and reduced interest in usual activities would 
be considered at the same level of severity as someone who re¬ 
ports daily depressed mood, guilt, feelings of inferiority, and 
suicidal thoughts, but denies all somatic and vegetative symp¬ 
toms of depression. Likewise, when item ratings are based on 
symptom intensity, a mild intensity rating of many symptoms 
is considered the same as a severe intensity rating of a more 
limited number of symptoms. 


The score summation approach, in which all items are 
weighted equally, is not grounded in a specific overriding con¬ 
ceptualization of severity. If illness severity is conceptualized 
in terms of mortality risk, then one would expect a measure of 
depression severity to weight more heavily item ratings of sui¬ 
cidal thoughts, hopelessness and psychomotor agitation than 
ratings of impaired concentration and fatigue. On the other 
hand, if illness severity is conceptualized in terms of functional 
impairment, then one might expect items assessing impaired 
concentration and fatigue to be weighted more heavily than 
items assessing appetite reduction or guilt. To be sure, some 
measures assess functional impairment along with symptom¬ 
atology 63,71,79 " 81 . No symptom-based measure, however, has 
been constructed by examining the association of individual 
items with indices of functional impairment and including on 
the scale only those items that are independently associated 
with impairment. 

Few studies have examined the association between sever¬ 
ity ratings of individual symptoms of depression and multiple 
external indicators of severity. Faravelli et al 48 found marked 
differences among symptoms in their association with CGI 
and GAF ratings. Moreover, the symptoms with the highest 
correlations with CGI ratings - such as depressed mood, psy¬ 
chic retardation, impaired concentration, and anhedonia - 
tended to have the highest correlations with GAF scores. 

Most discussions of the problems with depression scales 
have focused on their limitations as outcome measures 69,82,83 . 
However, different aspects of outcome measurement may be 
of interest, and these differences might result in different ap¬ 
proaches towards scale construction. Some measures of the 
severity of depression have been specifically designed to be 
sensitive to treatment effects 39,84 . Some measures are linked 
to the symptom criteria that are used to diagnose depres¬ 
sion 71,79,85,86 , whereas others assess a broad range of features 
that patients indicate are most important in measuring out¬ 
come 80 or assess a range of diagnostic and associated symp¬ 
toms of depression 87 . Descriptions of scale construction typ¬ 
ically focus on the content of the measure and rarely discuss 
the reason for choosing the rating format. For example, in de¬ 
veloping the Multidimensional Depression Assessment Scale, 
Cheung and Power 68 reviewed the content of fifteen depres¬ 
sion scales and how their scale would address a content gap. 
There was no discussion, however, of rating formats and why 
a symptom frequency format was chosen for their measure 
rather than a rating format assessing symptom intensity. 

One of the commonly used clinician rated measures of se¬ 
verity, the MADRS, was designed to be particularly sensitive to 
change in treatment trials 39 . Items were selected if they were 
prevalent in the patients at the beginning of treatment (i.e., 
prevalence greater than 70%), showed the greatest change from 
baseline to week 4 of treatment, and change in scores from 
baseline to week 4 on the symptom showed the greatest corre¬ 
lation with change in total scores on the measure. While there is 
nothing inherently wrong with constructing a measure in this 
manner for this purpose, this should not be the basis for select- 
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ing items on a measure of depression severity, as the resulting 
scale can be biased towards the inclusion of items that are par¬ 
ticularly sensitive to change for the medication(s) studied. The 
construction of the MADRS was based on response to mianser¬ 
in, maprotiline, amitriptyline, and clomipramine - medications 
that are not commonly used today. Using the same approach to 
construct a measure today, when different medications are pre¬ 
scribed, might produce a scale that only partially overlaps with 
the items included on the MADRS. In the same vein, the HAMD, 
which was published more than 50 years ago, has been criti¬ 
cized for including items that are most responsive to the effects 
of sedating medications such as tricyclic antidepressants 88 . 

So, while there are many rating scales of depression, and 
several studies examining them, questions remain as to how 
to judge if one measure is a more valid indicator of depression 
severity than another measure. Should it be based on psycho¬ 
metric analyses indicating unidimensionality? Would a "bet¬ 
ter’' measure of severity be more highly correlated with indices 
of impairment? Be more highly correlated with current sui¬ 
cidal ideation? Be more highly predictive of future suicidal be¬ 
havior? Be more highly predictive of future mortality in gen¬ 
eral? Be more highly predictive of future course? Be better able 
to distinguish depressed patients who do and do not require 
hospitalization? Demonstrate a larger effect size in a treatment 
study? Have greater discriminative ability between depression 
and anxiety, and thus be a "purer" measure of depression? 

A problem with depression scales: uncertain validity of 
cutoffs to define severity groupings 

Putting aside the question of how to best conceptualize se¬ 
verity and construct a scale, a problem with the existing litera¬ 
ture on depression severity is the inconsistency in the cutoff 
scores on symptom scales used to demarcate levels of severity, 
particularly severe depression. The use of various cutoff scores 
to define severity groups makes it difficult to compare the 
studies on the treatment implications of severity. 

DeRubeis et al 89 conducted a mega-analysis of four studies 
comparing cognitive-behavioral therapy and medication, and 
defined severe depression as a cutoff of 20 or more on the 17- 
item HAMD. Likewise, the recent mega-analysis of placebo- 
controlled trials of fluoxetine and venlafaxine used a cutoff of 
20 to define severe depression 90 . Both of these studies cited 
the landmark study by Elkin et al 91 to justify their definition of 
severe depression. However, Elkin et al did not cite empirical 
evidence for this cutoff and, in fact, did not refer to the patients 
scoring above 20 on the HAMD in absolute terms (i.e., having 
severe depression), but instead referred to these patients in 
relative terms (i.e., having more severe depression than the 
patients scoring 20 and below). 

In Kirsch et al's 92 meta-analysis of the impact of severity on 
antidepressant-placebo differences, the authors noted that the 
mean baseline LIAMD scores of the antidepressant efficacy 
trials were in the very severe range (i.e., > 23) based on the 


American Psychiatric Association (APA)’s Handbook of Psychi¬ 
atric Measures 93 for all but two of the 35 studies included in the 
analysis. In a prior analysis of antidepressant efficacy studies in 
the Food and Drug Administration (FDA) data base, Khan 
et al 94 divided the studies into three groups based on pre-treat¬ 
ment HAMD scores (<24, 25-27, >28) without indicating the 
basis for using these cutoff scores to define the groups. Four¬ 
nier et al 95 used the thresholds recommended in the APA's 
Handbook of Psychiatric Measures 93 to define grades of sever¬ 
ity on the HAMD (mild to moderate: <18; severe: 19 to 22; very 
severe: >23). In contrast to these studies, and the APA guide¬ 
lines, most pharmacotherapy studies have used a cutoff of 25 
on the 17-item LIAMD to define severe depression 96101 and 
this cutoff has been recommended by several experts 102 104 . 
Thus, severe depression has not been consistently defined. 

Fundamental to studies on the treatment implications of se¬ 
verity levels is the validity of the cutoffs on the HAMD to define 
the severity categories. In none of the discussion sections of 
the meta-analyses and pooled analyses of the reports on sever¬ 
ity and treatment outcome were questions raised about the 
cutoffs used to define the grades of severity. The APA's Hand¬ 
book of Psychiatric Rating Scales 93 cited only two small studies 
in support of the cutoff scores to identify severity subtypes, 
and neither study provided support for the APA guidelines. 
One was a study examining the validity of deriving a HAMD 
equivalent score on the Schedule for Affective Disorders and 
Schizophrenia 105 . This study did not attempt to determine the 
cutoff scores on the HAMD indicating grades of severity. The 
second study examined the association between HAMD scores 
and global ratings of severity in 59 depressed inpatients 106 . 
The authors did not derive (or recommend) cutoff scores cor¬ 
responding to severity levels. Thus, it is unclear why a cutoff of 
19 was recommended in the APA Handbook to identify severe 
depression. The UK National Institute for Health and Clinical 
Excellence (NICE) guidelines recommended a cutoff of 23 to 
identify severe depression on the HAMD, though no research 
was cited to support this recommendation 107 . 

Because of the limited amount of empirical research estab¬ 
lishing cutoff scores for bands of severity on the HAMD, and 
the significance accorded to severity by treatment guidelines, 
our clinical research group also examined this issue in 627 
psychiatric outpatients with MDD who were rated on the 
CGI 108 . The cutoff score on the HAMD that maximized the 
sum of sensitivity and specificity was 17 for the comparison of 
mild vs. moderate depression and 24 for the comparison of 
moderate vs. severe depression. Based on a review of the avail¬ 
able evidence, as well as the recommendations that a cutoff of 
7 be used to define remission, we recommended the following 
severity ranges for the 17-item HAMD: no depression (0-7); 
mild depression (8-16); moderate depression (17-23); and se¬ 
vere depression (>24). 

Each of the above studies derived cutoff scores based on 
clinicians’ global judgments of severity. A limitation of these 
studies is that it is not known on what basis the global judg¬ 
ments of severity were made. Were some symptoms of depres- 
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sion considered better indicators of severity than other symp¬ 
toms? For example, are symptoms characteristic of melan¬ 
cholic or endogenous depression given greater weight in clini¬ 
cians' CGI ratings? Are clinicians’ global ratings dispropor¬ 
tionately influenced by degree of suicidality? Do clinicians 
consider psychosocial impairment in making their CGI rat¬ 
ings? We are unaware of any studies that have attempted to 
derive severity ranges on the HAMD, or any other depression 
scale for that matter, based on degree of impairment or level of 
suicidality. 

Another problem with depression symptom scales: 
different scales classify patients into different severity 
groups 

In clinical practice, self-report questionnaires are prefer¬ 
able to clinician-rated scales because they take less time to ad¬ 
minister. If self-report scales are to be used to classify patients 
into severity categories, and if treatment recommendations 
are to be based, in part, on severity classification, then it is im¬ 
portant for different scales to classify individuals similarly. 
However, because the content of measures differ, it would not 
be surprising if there were significant differences between meas¬ 
ures. 

Cameron et al 109 compared the PHQ-9 and the Hospital 
Anxiety and Depression Scale (HADS) severity classifications 
in a sample of primary care patients referred by their general 
practitioners in the UK to a mental health worker 110 . No infor¬ 
mation was provided regarding the patients’ psychiatric diag¬ 
noses. They found that the PHQ-9 overclassified severity com¬ 
pared to the HADS, with twice as many patients classified in 
the severe range. Other studies comparing the PHQ-9 and the 
HADS in medical patients found similar results 111,112 . How¬ 
ever, these studies lack an external validator and it is therefore 
unclear if the PHQ-9 overclassifies, or the FLADS underclassi¬ 
fies, severity. A second study by Cameron et al 107 included the 
second edition of the Beck Depression Inventory (BDI-II) 113 
along with the PHQ-9 and FIADS, and also assessed the pa¬ 
tients with the HAMD. The participants were primary care pa¬ 
tients who had been diagnosed by their general practitioner 
with depression. Both the PHQ-9 and BDI-II overclassified se¬ 
verity compared to the HAMD, whereas the FLADS underclas¬ 
sified severity. 

We are aware of only one study that compared self-report 
scales in a sample of psychiatric outpatients with MDD 114 . Our 
clinical research group compared severity classification on 
three measures that assess the DSM-IV/DSM-5 symptom cri¬ 
teria for MDD: the Clinically Useful Depression Outcome Scale 
(CUDOS) 79 , the Quick Inventory of Depressive Symptomatol¬ 
ogy (QIDS) 85 , and the PHQ-9 71 . The patients were also rated 
on the 17-item HAMD. In a study of depressed outpatients, we 
found that the correlations between the HAMD and all three 
self-report scale scores were nearly identical, and the average 
correlation among the three self-report scales was .73. How¬ 


ever, the scales significantly differed in their distribution of pa¬ 
tients into severity categories. Approximately one-third of the 
patients scored in the mild range on the HAMD and CUDOS, 
whereas approximately 10% of the patients were mildly de¬ 
pressed according to the PHQ-9 and QIDS. On the CUDOS 
and HAMD, moderate depression was the most frequent se¬ 
verity category, whereas on the PHQ-9 and QIDS the majority 
of the patients were classified as severe. The majority of the 
patients in the moderate range on the HAMD were in the se¬ 
vere range on the PHQ-9 and QIDS. Significantly fewer pa¬ 
tients were classified as severely depressed on the CUDOS 
compared to the PHQ-9 and QIDS. 

With the three self-report measures being highly correlated 
with each other, and equally correlated with the HAMD, what, 
then, might account for the marked differences between scales 
of similar content in the distribution of patients into severity 
groups? 

The cutoffs on the three scales to define the severity groups 
were derived in different ways, and this was likely responsible 
for the differences between the scales in severity classification. 
For example, Kroenke et al' 1 indicated that the cutoff scores on 
the PHQ-9 were chosen for the pragmatic reason of making 
them easier for clinicians to recall. They also noted that alterna¬ 
tive cutoffs did not increase the association between increasing 
PHQ-9 severity and indices of construct validity. When select¬ 
ing the cutoff scores to define the severity ranges on the PHQ-9, 
the developers of this questionnaire did not consider the po¬ 
tential impact of the broadness by which severity ranges were 
defined and how this might impact on treatment recom¬ 
mendations of official treatment guidelines. 

Kroenke et al 71 indicated that, when severity groupings 
based on different cutoffs are equally associated with external 
variables, then the cutoffs can be chosen based on their ease 
of recall. We disagree with this reasoning. For all scales meas¬ 
uring the severity of depressive symptoms, the thresholds dis¬ 
tinguishing patients with mild, moderate and severe depres¬ 
sion do not represent well-demarcated lines separating the 
severity subtypes. As with other areas of psychopathology, the 
severity of depression better corresponds to a dimensional 
than a categorical model of classification 115 . Thus, alternative 
cutoffs to categorize severity groupings are likely to also be 
valid when the groupings are compared on an external variable 
such as psychosocial functioning. However, one should not be 
cavalier about the choice of cutoffs, because they impact on 
the relative broadness of each of the severity categories. 

If clinicians are to follow official treatment guidelines' re¬ 
commendations and base initial treatment selection on the se¬ 
verity of depression, then it is important to have a consistent 
method of determining depression severity. The marked dis¬ 
parity between standardized self-administered scales in the 
classification of depressed outpatients into severity groups in¬ 
dicates that there is a problem with the use of such instruments 
to classify depression severity. If official treatment guideline re¬ 
commendations were followed, then use of measures such as 
the QIDS and PHQ-9, which broadly define the severe cat- 
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egory, would result in greater reliance on medication in prefer¬ 
ence to psychotherapy as the first line treatment option for 
MDD. Caution is thus warranted in the use of these scales to 
guide treatment selection until the thresholds to define severity 
ranges have been better established empirically. 

The importance of severity of depression in treatment: 
official guideline recommendations 

Notwithstanding the aforementioned problems with con¬ 
ceptualizing the severity of depression, and defining the cut¬ 
offs on scales for severity levels, depression severity is an im¬ 
portant consideration in treatment decision-making. The se¬ 
verity of depression has influenced treatment recommenda¬ 
tions in official guidelines. The third edition of the APA's 
guidelines for the treatment of MDD recommend both psy¬ 
chotherapy and pharmacotherapy as monotherapies for de¬ 
pression of mild and moderate severity, and pharmacotherapy 
(with or without psychotherapy) for severe depression 1 . The 
NICE updated guidelines for the treatment and management 
of depression discourage the use of antidepressant medication 
as the initial treatment option for mild depression, and recom¬ 
mend medication together with empirically supported psy¬ 
chotherapy for moderate and severe depression 2 . As reported 
by van der Lem et al 116 , treatment guidelines in the Nether¬ 
lands also recommend pharmacotherapy as the first treatment 
option for severely depressed patients, and either pharmaco¬ 
therapy or psychotherapy for mildly and moderately depressed 
patients. While the recommendations in these guidelines are 
not entirely consistent, they are unanimous in recommending 
medication as the treatment of choice for severe depression. 

The treatment significance of severity has been studied in 
several different ways. There are controlled studies, effective¬ 
ness studies, pooled analyses, and meta-analyses examining 
the impact of severity on particular treatments 117 " 122 , compar¬ 
ing treatments across a range of severity 99 ’ 123 ' 127 , comparing 
medication and placebo across a range of severity 128 ’ 129 , com¬ 
paring psychotherapy and control groups across a range of se¬ 
verity 130 ’ 131 , comparing treatments amongst severely depressed 
patients 96 ’ 101 ' 102 ’ 132 , and examining whether severity predicts 
short-term outcome 42 ’ 133 ' 135 , treatment resistance 136 , longer- 
term outcome 40 ’ 137 " 139 , and relapse 38 . 

Severity of depression and pharmacotherapy 

In the past decade, questions have been raised whether se¬ 
lective serotonin reuptake inhibitors (SSRIs) and other new 
generation antidepressants are effective in non-severe depres¬ 
sion. Khan et al 94 analyzed 45 clinical trials in the FDA data¬ 
base and found that in studies with a mean baseline 17-item 
HAMD score of 24 or less there was little evidence that anti¬ 
depressant medication was superior to placebo, whereas in 
studies with a mean baseline flAMD score of 28 or greater 


there was clear evidence that medication was superior to pla¬ 
cebo. Kirsch et al 92 similarly examined the FDA database, and 
they also examined the efficacy of antidepressants as a func¬ 
tion of mean baseline HAMD score in the trial. Their results 
largely replicated the findings of Khan et al 94 that drug-pla¬ 
cebo differences were largest in the studies with the highest 
baseline severity (i.e., HAMD >28). Kirsch et al 92 found that 
antidepressants were significantly more effective than placebo 
in the less severe cohorts, but they considered the difference 
in response to be modest and clinically insignificant. 

In contrast to the analyses of the FDA database by Kirsch 
et al 92 and Khan et al 94 , Fournier et al 95 pooled individual pa¬ 
tient data from six published studies. Kirsch et al and Khan 
et al used aggregated mean scores for an entire study as the 
unit of analysis. That is, they compared studies with different 
mean severity scores at baseline. The problem with this ap¬ 
proach is that a group of patients with a mean score in the se¬ 
vere range will also include some patients in the mild and 
moderate severity ranges. Likewise, a group of patients with a 
mean score in the mild or moderate severity range will include 
some patients scoring in the severe range. Pooling individual 
patient data avoids the problem of severity group misclassi- 
fication at the individual patient level. Fournier et al 95 repli¬ 
cated the finding that drug-placebo differences were clinically 
significant only for severely depressed patients, and found 
only a small effect size for mildly and moderately depressed 
patients. 

More recently, other pooled analyses of patient level data (ra¬ 
ther than aggregated data from a trial) have been conducted. 
Using pharmaceutical company data bases, these analyses in¬ 
cluded all studies of a product, thereby avoiding the bias inher¬ 
ent in examining only published studies 140 . The results of three 
large, pooled analyses of published and unpublished studies, 
which included between 4,000 and 10,000 subjects each, indi¬ 
cated that antidepressants are effective across a range of sever- 
ity 90 ’ 129 ’ 141 . These analyses, and the controversy that has been 
stirred regarding the efficacy of antidepressants, highlights the 
impact that considerations of severity might have on clinical 
practice. 

Severity of depression and medication or psychotherapy 
as first line treatment 

A second important severity related treatment question is 
whether the severity of depression should be used as the basis 
for recommending medication or psychotherapy as first line 
treatment. More specifically, the question is whether patients 
with severe depression should preferentially be treated with 
medication. A related question is whether psychotherapy is 
beneficial for severely depressed patients. 

Symptom severity as a moderator of treatment response has 
been the subject of ongoing debate since the publication of the 
results from the US National Institute of Mental Health Treat¬ 
ment of Depression Collaborative Research Program (TDCRP), 
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suggesting that psychotherapy was not as effective as medica¬ 
tion in the acute treatment of severe depression 91,142 . The first 
meta-analysis of studies direcdy comparing psychotherapy 
and pharmacological interventions included 30 published 
studies of more than 3,000 patients 143 . A meta-regression anal¬ 
ysis examining whether effect sizes were associated with mean 
baseline scores on the HAMD or BDI found no evidence that 
baseline severity was associated with differential treatment 
outcome. A comparison of effect sizes in studies with baseline 
HAMD scores below 20 vs. 20 and above also found no differ¬ 
ences. 

A meta-analysis of 132 controlled psychotherapy studies of 
more than 10,000 patients found that greater mean baseline 
symptom severity did not predict poorer response 130 . More 
recently, Weitz et al 144 pooled individual patient data from 16 
studies comparing antidepressants and cognitive behavior 
therapy. They defined the severe group according to the APA 
(HAMD >19) and NICE (HAMD >23) recommendations. In¬ 
creased severity was associated with significandy lower remis¬ 
sion rates (but not response rates) in both the medication and 
psychotherapy treatment conditions. Severity was not associ¬ 
ated with differential treatment outcome, thus confirming the 
results of a prior pooled analysis based on a smaller number 
of studies 89 . In a follow-up study, the authors conducted a 
pooled analysis focused on the five studies that used placebo 
as the control condition 131 . The results were consistent with 
the larger pooled analysis: baseline symptom severity was not 
associated with change in symptom severity scores from base¬ 
line to endpoint between the cognitive behavior therapy and 
pill placebo groups. 

The results of these more recent meta-analyses, based on 
severity classification according to symptom rating scales, are 
thus not consistent with official treatment guidelines which 
recommend medication as the first line treatment for severe 
depression. 

SEVERITY OF PERSONALITY DISORDERS 

Severity is clearly of import to PDs, though the current diag¬ 
nostic systems do not include any formal severity ratings. PD 
patients identified as "severe” are more likely to exhibit high co¬ 
morbidity with other psychiatric diagnoses, particularly mood, 
anxiety, substance use 145 , and other PDs 146 . So-called "se¬ 
vere” cases are often in treatment for protracted periods of 
time 147 149 , exhibit higher rates of hospitalization and suicide 
attempts 150 , and self-injure with greater frequency 151 . They 
are likely to be incarcerated, unable to hold down a job, and 
have failed relationships 152 . It is generally agreed that they 
may present a public health burden, and therefore should be 
identified early and get treated often 3,4,153 . 

Nonetheless, the question remains: what is meant by “se¬ 
vere” PD? Severity has been assessed by counting the number 
of comorbid PD diagnoses overall, with higher comorbidity 
indicating higher severity 152,154 156 . However, this may better 


reflect the severity of overall personality pathology rather than 
the severity of a particular PD. More severe cases of personal¬ 
ity pathology may further be identified by case complexity and 
specific comorbidity patterns. The main section of DSM-5 
(i.e., Section II) identifies PDs as occurring in one of three clus¬ 
ters. Tyrer and Johnson 157 proposed that individuals with co¬ 
morbid PDs from more than one cluster are more severe than 
those with comorbid PDs from the same cluster. The authors 
further identify antisocial PD as the most severe PD based on 
risk to others. Therefore, the most severe cases must be diag¬ 
nosed with antisocial PD as well as PDs from other clusters. 
Using this model, severity of PD was associated with con¬ 
duct disorder, criminal behavior, homelessness, institutional¬ 
ization, unemployment, and delinquent behavior in child¬ 
hood. 

Severity of a specific PD may be measured by counting the 
number of criteria met. For example, cases of borderline PD 
for which nine criteria are endorsed would be viewed as more 
severe than patients endorsing only five criteria 147 . However, 
results from our clinical research group did not support this 
hypothesis, finding no differences in comorbidity or psycho¬ 
social functioning based on criteria count for patients diag¬ 
nosed with borderline PD 158 . Alternatively, severity can be de¬ 
fined by the frequency of symptoms. For instance, patients 
with borderline PD who engage in self-injury multiple times 
daily would be more severe than those reporting only monthly 
self-injury 151 . 

Specific PDs have even been identified as more or less se¬ 
vere than others. Kernberg and Caligor 159 organized PDs into 
a hierarchy ranging from "more severe" (e.g., borderline PD) 
to less severe (e.g., obsessive compulsive PD, dependent PD). 
There has also been a strong push for conceptualizing PDs 
using constellations of pathology personality traits. From this 
perspective, a “severe” PD symptom or trait may be defined as 
one that is statistically extreme, or existing in only a very small 
proportion of the population 160 . 

Treatment research of "severe" personality disorders primar¬ 
ily emphasizes symptom characteristics (frequency, persist¬ 
ence, intensity) and functional impairment (social/occupa¬ 
tional, or outcomes such as imprisonment) 161163 . Maden and 
Tyrer 162 identify a category of "dangerous and severe" PD, 
which is characterized by having a high risk of causing unre¬ 
coverable harm to others. Confusingly, the first criterion for 
having a "dangerous and severe" PD is already being diag¬ 
nosed with a "severe disorder of personality" which remains 
undefined itself. The authors do not clarify what severity 
means at the criterion level, although it appears this definition 
is legal in origin, and refers primarily to psychopathy and not 
to PDs as they are traditionally defined. 

Severity of personality disorders and functioning 

Although severity has been defined in various ways in the 
PD literature, a general consensus appears to have emerged 
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that PD severity is inherently linked with level of maladaptive 
functioning 164 " 169 . It is widely acknowledged that extreme trait 
or symptom variation is insufficient to diagnose PDs or to dic¬ 
tate diagnostic severity. Rather, the emphasis lies in having ex¬ 
treme personality traits in the presence of impairment associ¬ 
ated with those traits. Unlike physical illnesses, or even depres¬ 
sion, which are more focused on symptom presentation, per¬ 
sonality diagnoses are intertwined with adaptive functioning. 
Like depression, PDs by definition must result in "distress or 
impairment" to be diagnosed 33 . In contrast to depression, how¬ 
ever, the symptom criteria for diagnosing PDs include both af¬ 
fective/cognitive/emotional and functional components. For 
example, impoverished occupational and financial functioning 
is included in symptom criteria for antisocial PD, and failure to 
engage in social and leisure activities is part of the criteria for 
obsessive-compulsive PD. 

The interrelationship between functional impairment and 
personality leads many to conclude that PD severity is a com¬ 
bination of extreme personality disturbance and maladaptive 
functioning associated with that disturbance 165,169 . In fact, func¬ 
tioning is so fundamental to determining PD presence and se¬ 
verity that some authors argue that assessing extreme traits/ 
symptoms is unnecessary 170 ' 173 . Thus, one need not demon¬ 
strate symptom severity if sufficient impairment is judged to be 
present. However, the dysfunction must be determined as due 
to the presence of the personality features, even if they are not 
extreme. For example, using the multiaxial DSM-IV, Livesley 174 
proposed defining PD as present diagnostically on Axis I, and 
coding personality traits separately on Axis II. Widiger and 
Trull 169 proposed a similar model, only using the GAF score on 
Axis V as a stand in for severity. 

Taken together, these models converge on defining severity 
as a generalized, adaptive failure of an intrapsychic system re¬ 
quired to fulfill daily life tasks 166 . Although specific areas of im¬ 
pairment differ, there is convergence on impairment in three 
broad areas: identity formation, self-control (or direction), and 
interpersonal relationships 164 . However, some research indi¬ 
cates that pathological personality traits and functioning are 
so closely intertwined that they may not represent distinct do- 

• 1 7 

mains . 


Severity of personality disorders as described in DSM-5 
and ICD-10 

There is no clear mention of severity with respect to PDs in 
the main section II of DSM-5 33 . However, the overall descrip¬ 
tion of PDs includes severity indicators common to other dis¬ 
orders. For example, PDs are specifically noted to be inflexible, 
maladaptive, pervasive, and associated with "clinically signifi¬ 
cant” functional impairment or subjective distress. Functional 
impairment is an indicator of severity in many physical and 
psychiatric disorders; pervasiveness is a severity indicator for 
depression; and subjective distress is identified as indicating a 
"severe case" for disorders of mood and sexual function. As it 


stands, there is no official method for indicating PD severity in 
DSM-5. 

Section III (Emerging Measures and Models) of DSM-5 in¬ 
cludes an alternative model for diagnosing PDs. Diagnosis is 
defined via a combination of severity levels of dysfunction 
and elevated personality traits, and severity is determined 
principally by dysfunction associated with elevated traits 33 . 
This model does not designate a measure for overall severity, 
but “moderate or greater impairment" is required for diagno¬ 
sis. Impairment is operationalized as falling into one of five 
levels, with the extreme end indicative of severe personality 
pathology. The Level of Personality Functioning Scale (LPFS) 
is proposed to rate impairments in functioning, and therefore 
also PD severity. Ratings are made for self (identity and self- 
direction) and interpersonal (empathy and intimacy) func¬ 
tioning. Levels include: 0 (little or no impairment), 1 (some 
impairment), 2 (moderate impairment), 3 (severe impair¬ 
ment), 4 (extreme impairment). Individuals with extreme im¬ 
pairment are described as having an impoverished, unclear 
identity and self-direction with maladaptive self-concept, and 
completely lacking capacity to engage interpersonally. 

Interestingly, DSM-5 Section III also includes discussion of 
an additional measure of personality traits, the Personality In¬ 
ventory for DSM-5 176 . The items are clearly trait content re¬ 
lated; however, the measure provides an overall summed 
score identified as measuring "overall personality dysfunc¬ 
tion”. The identification of extreme traits as indicative of dys¬ 
function is curious, but not inconsistent with the significant 
overlap between functioning and PD traits/symptoms found 
elsewhere in the literature 175 . Nonetheless, this suggests that 
extreme traits are at least indicative of extreme dysfunction, 
which is the primary index of severity in this model. 

Similar to the DSM-5, the ICD-10 does not make mention of 
severity in PD classifications. However, several papers have 
been published on changes proposed for ICD-11, which are 
substantial. Most notably, the primary classification of PDs will 
change to one based on severity of personality disturbance. 
Description of PD traits or features is optional but will not be 
required for diagnosis 3,4 . 

Consistent with the larger literature, the proposed changes 
to the ICD-11 conceptualize severity primarily as dysfunction, 
or the personality-related problems experienced by the indi¬ 
vidual. Again, five levels of severity are proposed, though they 
vary slightly from those in the DSM. Summed together, sever¬ 
ity levels are dictated first by pervasiveness of the impairment 
(across situations or limited), and secondarily by the number 
of problematic personality traits (multiple or single). At the 
highest level of severity, risk to self or others is also assessed. 
Thus, the most severe cases are identified by functioning 
above all else. Symptoms/traits and risk of harm are second¬ 
ary, but also considered. Unlike the DSM-5 alternative model 
proposal, dysfunction in self and identity is not included in se¬ 
verity ratings 3,4 . At the time of this writing, the ICD-11 has not 
yet been published, and therefore these definitions should be 
considered provisional. Nonetheless, the emphasis on func- 
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tioning via severity ratings has been criticized for insufficient 
research establishing its reliability and validity 177 . 

Measures of personality severity 

As early as 1996, Tyrer and Johnson 137 developed a five- 
point scale assessing disorder severity similar to that in the 
ICD-11 proposal. Ratings were made based on information de¬ 
rived from a trait personality measure, the Personality Assess¬ 
ment Schedule (PAS) 153 . Thus, severity was weighted more to¬ 
wards extremity on traits than on functioning. The PAS has 
also been used to classify individuals into the four PD catego¬ 
ries proposed by Tyrer and Johnson 157 : no PD, personality dif¬ 
ficulty, simple PD, complex PD. PAS severity designations are 
primarily based on the frequency of DSM-IV and ICD-10 cat¬ 
egories, and have been used in studies predicting treatment 
outcomes, albeit with mixed findings 178 . The General Assess¬ 
ment of Personality Disorder 179 has been used as an index of 
severity in multiple studies, and provides two main scales of 
severity - self-pathology and interpersonal problems - both of 
which reflect functional impairment as defined by the DSM- 
f>i64,i8o,i8i. Similarly, the Severity Indices of Personality Prob¬ 
lems 173 defines severity as a combination of impoverished self 
and interpersonal functioning. 

Relatively few measures of severity exist for individual PDs, 
and these largely focus on borderline PD. For example, the 
Borderline Personality Disorder Severity Index (BPDSI) 151 ’ 182 
is a semi-structured clinical interview that operationalizes se¬ 
verity primarily by frequency of borderline PD symptom be¬ 
haviors over the preceding three months. Frequency of symp¬ 
toms is rated from 0 (never) to 10 (daily). Severity is averaged 
across these scores, yielding severity scores for individual bor¬ 
derline PD criteria as well as the diagnosis overall. Thus, the 
BPDSI largely measures severity as a function of symptom fre¬ 
quency, though many of the items also ask about behaviors 
that have implied functional consequences (e.g., going out in¬ 
stead of working). 

Consistent with the severity of personality pathology often 
being linked with impairments in functioning, PD treatment 
outcome research has often focused on the degree to which vari¬ 
ous treatment approaches (e.g., dialectical behavioral therapy, 
mentalization-based treatments, transference-focused psycho¬ 
therapy) improve day-to-day functioning and reduce specific, 
concrete maladaptive behaviors 147 ' 183,184 . For instance, in the 
extensive borderline PD treatment literature, change in person¬ 
ality pathology is often assessed using measures such as the 
Zanarini Rating Scale for Borderline Personality Disorder 185 and 
the Barratt Impulsiveness Scale 186 . However, reduction in 
suicide attempts, self-harm behavior, and reliance on psychi¬ 
atric emergency treatment services are often primary treat¬ 
ment outcome measures, as are improvements in maintain¬ 
ing meaningful relationships and improving workplace func- 
tioning 147,183,184 ' 187 ' 188 . 


Although the PD treatment literature has focused primarily 
on the treatment of borderline PD, other PDs also have received 
some attention, with functional impairment being identified as 
central to treatment outcomes. For instance, transference-fo¬ 
cused psychotherapy has demonstrated some benefit for pa¬ 
tients with comorbid narcissistic and borderline PD, and this 
treatment approach emphasizes interpersonal functioning in 
personal and workplace relationships when assessing out¬ 
come 189 . Treatment research on antisocial PD has focused on 
subsequent substance use and arrests 190 . Thus, across the 
treatment of various PDs, treatment outcome and a reduction 
in "severity” is understood not just as symptom reduction, 
but also reduction in specific deleterious behaviors (e.g., self- 
harm) and the promotion of interpersonal functioning and 
specific prosocial behaviors (e.g., maintaining employment). 

TRANSDIAGNOSTIC MODELS AND SEVERITY: THE 
EMERGENCE OF PSYCHOPATHOLOGY SPECTRA 

Many of the questions asked above as to how to compare 
the validity of depression scales in measuring severity also 
apply to determining if different diagnoses confer differential 
levels of severity. The likelihood of meeting criteria for differ¬ 
ent diagnoses confers standing on underlying genetic liabil¬ 
ities 191,192 . This is important to consider given that individuals 
who meet criteria for one diagnosis are very likely to meet cri¬ 
teria for multiple other diagnoses 193 , such that various diag¬ 
noses may be thought to be manifestations of underlying spec¬ 
tra (e.g., antisocial PD, narcissistic PD and substance use all 
reflect an underlying externalizing spectrum). 

Research examining the relations amongst various internal¬ 
izing diagnoses characterized by subjective distress and fear 
suggests that it may be "easier" for individuals to meet criteria 
for diagnoses such as MDD than for more "severe" disorders 
such as generalized anxiety and panic disorders 194 . Put differ¬ 
ently, meeting criteria for generalized anxiety or panic disorder 
reflect higher standing on the internalizing dimension than 
would simply meeting criteria for MDD. Interestingly, Krueger 
and Finger 194 also found that high standing on the internalizing 
dimension was linked robustly to lifetime number of inpatient 
hospitalizations and past month psychosocial functioning. 

Other more recent research has also linked "severity" on 
the internalizing spectrum to key outcomes. For instance, Ea¬ 
ton et al 195 found that the likelihood of meeting criteria for 
various depressive disorders, anxiety disorders, and bipolar 
disorders can be represented by an underlying continuum. In¬ 
dividuals with high scores on this dimension, who would be 
characterized as having more "severe” levels of internalizing 
psychopathology, would thus be likely to meet criteria for 
many diagnoses and to report a broad range of symptoms 
(e.g., depressed mood, worry, concentration difficulties, irrita¬ 
bility) characterizing the various DSM diagnoses defining this 
dimension. 
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Eaton et al 195 presented evidence indicating that scores on 
the internalizing spectrum predicted outcomes such as the fu¬ 
ture occurrence of internalizing symptoms (e.g., depressed 
mood, worry), suicide attempts, angina/chest pain, and ul¬ 
cers. Moreover, standing on this underlying dimensionally- 
based internalizing spectrum predicted these outcomes much 
more strongly than did DSM-based conceptualizations of vari¬ 
ous internalizing disorders (e.g., MDD, generalized anxiety 
disorder), thereby providing evidence for the utility of this ap¬ 
proach in capturing severity as it relates to important out¬ 
comes such as suicidality and physical health concerns 195 . 

In regard to other forms of psychopathology, Krueger et al 196 
presented evidence indicating that symptoms and behaviors 
defining personality and substance use disorders can be cap¬ 
tured by an underlying externalizing dimension. Other re¬ 
search also supports the presence of this underlying latent 
externalizing dimension, which explains why antisocial be¬ 
haviors (e.g., various unlawful behaviors) and traits (e.g., im- 
pulsivity, callousness) and substance use issues are likely to 
co-occur 191,197 . Carragher et al 197 presented findings suggest¬ 
ing that meeting criteria for some disorders (e.g., cocaine 
dependence) confers higher standing and severity on this 
underlying externalizing dimension than other "less severe" 
disorders (e.g., nicotine and alcohol dependence). Similarly, 
overlap in disorders such as schizophrenia and schizotypal PD 
appears to be reflected by a thought disorder spectrum 191,198 . 
Standing on this spectrum has been linked to functional im¬ 
pairment and illness course 198 . 

Going forward, it will continue to be important for future re¬ 
search to further explicate how level of severity (i.e., how likely 
an individual is to meet criteria for different disorders and to 
meet criteria for "difficult” disorders such as cocaine depend¬ 
ence in the case of the externalizing spectrum) captured by 
broad internalizing, externalizing, and thought disorder di¬ 
mensions predicts illness course and other key outcomes re¬ 
lated to morbidity and mortality. These dimensions account 
for diagnostic comorbidity amongst various disorders and have 
been shown to predict various outcomes more strongly than 
diagnostic status on various DSM disorders, suggesting im¬ 
portant merits to this approach 191,195 . In this regard, the Hier¬ 
archical Taxonomy of Psychopathology (HiTOP) has emerged 
as a dimensionally-based alternative to the DSM-5 191,199 . Thus, 
it will be important to determine the degree to which this 
framework adequately captures psychopathology "severity", 
however severity is defined, and is useful for researchers and 
practitioners. 

CONCLUSIONS 

The issue of severity has great clinical importance. Severity 
influences decisions about level of care and affects decisions 
to seek government assistance due to psychiatric disability. In 
outpatient settings, the importance of severity is reflected in 
the controversy about the efficacy of antidepressants across 


the spectrum of depression severity, and whether patients 
with severe depression should be preferentially treated with 
medication rather than psychotherapy. 

We began this paper with a series of questions as to how the 
severity of psychopathology should be conceptualized. Some 
authors have suggested that the core indicator of the severity of 
mental illness is functional disability 200 . The DSM-5 has defined 
the severity of different disorders in different ways. Our review 
of the literature for depression and PDs demonstrated that re¬ 
searchers have adopted a myriad of ways of defining severity. 
The severity of depression has predominantly been defined ac¬ 
cording to scores on symptom rating scales. To be sure, there is 
some variability in how items are rated (i.e., symptom intensity 
vs. symptom frequency vs. symptom persistence), as well as 
some variability in the range of symptoms assessed by different 
measures of depression. Irrespective of the precise manner by 
which symptom severity is determined, most of the literature 
on the severity of depression is based on the parameters of 
symptoms. By contrast, the core of personality pathology is 
intertwined with its impact on functioning. Distinguishing ex¬ 
treme variants of personality traits from functioning has been 
challenging, therefore functional impairment has been funda¬ 
mental to conceptualizing the severity of PDs. 

Because the functional impact of symptom-defined dis¬ 
orders such as MDD depends on factors unrelated to the dis¬ 
order such as self-efficacy, resilience, coping ability, social 
support, cultural and social expectations, as well as the re¬ 
sponsibilities related to one's primary role function and the 
availability of others to assume those responsibilities, we 
would argue that the severity of such disorders should be de¬ 
fined independently from functional impairment. To those 
who would disagree, consider the following scenario: two in¬ 
dividuals have an upper respiratory tract infection. They have 
the same elevation in body temperature, sneeze and cough 
with the same frequency, have the same level of mucus pro¬ 
duction and nasal discharge, and the same viral load. And the 
symptoms last for the same number of days. In short, they 
have the same intensity, frequency, and persistence of symp¬ 
toms. Yet one person misses work for a week and the other 
does not miss work. Does the person who missed work have a 
more severe upper respiratory tract infection? 

A distinction could be made between defining severity at 
the level of a disorder vs. overall global illness severity. As 
stated, at the level of disorder, severity should be determined 
by the factors that are intrinsic to the disorder. Thus, the sever¬ 
ity of depression should be determined by the intensity, fre¬ 
quency, and/or persistence of the depressive symptoms. And 
the same is true for other disorders such as generalized anxiety 
disorder, post-traumatic stress disorder, mania/hypomania, 
and tic disorder. The severity of panic disorder should be 
based on the intensity and frequency of panic attacks. The se¬ 
verity of premature ejaculation should be based on time to 
ejaculation, the severity of hypoactive sexual desire based on 
the intensity (or lack thereof) of desire, the severity of binge 
eating disorder on the frequency and intensity of binges, etc.. 
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The episodic nature of some psychiatric disorders and symp¬ 
toms presents some measurement challenges. There may be 
day-to-day variability in symptom intensity as well as symp¬ 
tom persistence through the course of the day. Symptom fre¬ 
quency varies by disorder. Too little research has compared 
the validity of symptom intensity, frequency, and persistence 
assessments. 

Severity, however, can be considered from another per¬ 
spective: at the level of overall illness. A patient with depres¬ 
sion, borderline PD, some anxiety disorders, substance use 
disorder and an eating disorder has a severe illness. It would 
likely be difficult to parse the levels of functional impairment 
to the separate disorders. The severity of the symptoms of de¬ 
pression may not be high, but the patient is nonetheless se¬ 
verely ill. How to take into account comorbidity when deter¬ 
mining the severity of individual disorders is not clear. A glob¬ 
al rating of overall illness severity was included in DSM-III 
through DSM-IV, but dropped from DSM-5. The global rating 
of illness severity can be considered to be akin to the compo¬ 
site measures of physical illness severity, described in the 
introduction, that have been used to predict mortality in 
emergency room and hospitalized patients. The problem with 
the GAF was that it was a single rating that required consider¬ 
ation of multiple constructs, including symptom frequency, 
type of symptom, level of impairment, suicidality, ability to 
care for oneself, and psychosis. Because of its complexity, 
there were problems with the reliability of its ratings 201 . Per¬ 
haps the dimensionally based measures of psychopathology 
articulated in HiTOP will yield clinically meaningful and use¬ 
ful approaches towards characterizing overall severity. 

In the future, research on severity needs to be clear as to what 
correlates of a measure are expected. We noted above that too 
little research has compared the validity of symptom intensity, 
frequency, and persistence assessments. The question is how to 
evaluate validity. Should severity be a predictor of outcome? 
Should it help match patients to appropriate treatments or ap¬ 
propriate levels of care? Should it predict mortality? Should it re¬ 
flect underlying pathophysiology? Should it confer genetic risk? 
Should it be used to guide the allocation of finite resources at ei¬ 
ther the insurance company or governmental funding agency 
level? 

There are a wealth of papers in the psychiatric, medical and 
epidemiological literatures that refer to depression severity in 
the title and examine the correlates of a measure of depressive 
symptoms. But how to best measure severity has largely not 
been the subject of study. Numerous scales have been devel¬ 
oped that purport to measure the severity of depression. When 
the authors of these scales discuss the reason behind develop¬ 
ing their measure, the explanation usually focuses on item 
content and rarely discusses the reason for choosing a particu¬ 
lar rating approach. Perhaps it does not make a meaningful 
difference how items are scaled. Perhaps the exact content of a 
scale does not make a meaningful difference either. Perhaps 
simplicity and clinical utility should trump any minor incre¬ 
mental validity that one measure shows over another. 


However, some research suggests otherwise. The ability to 
detect differences between medication and placebo may be re¬ 
lated to the content of the measure used 202 . Scales differ in se¬ 
verity classification 111 ' 112 ' 114 , and treatment guidelines suggest 
that severity be used to select among treatment alternatives 1,2 . 
Thus, severity has real world implications in both the research 
and clinical communities. It is our hope that this paper stimu¬ 
lates more consideration and research into the issue of how to 
best conceptualize and measure the severity of psychiatric dis¬ 
orders. 
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